-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convey uncertainty via tip colors #1796
Conversation
Consuming code expected this to be a boolean and all actions which update this state set a boolean. Thankfully this means the default state was never used in practice.
.clamp(true); | ||
const tipOpacityFunction = branchOpacityFunction | ||
.copy() | ||
.range([0, 0.9]); // if entropy close to 0 return the original node color |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The range (and the domain) of this scale are the magic values which control how entropy values affect the stroke colour (via an interpolation between the original colour & grey). The tip fill colour is a brighter version of the stroke colour. Very happy for adjustments here, although I think it's important that tipOpacityFn(close-to-zero) -> 0
so that we don't change how the majority of datasets appear.
57737fc
to
f9e654f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would one be able to differentiate the grey scale for uncertainty vs grey scale for unprovided colorings? For example, imagine if zika's region had uncertainty, it would be mixed in with the "Asia" grey colorings.
(The default "Asia" coloring issue will be fixed when augur is released with nextstrain/augur#1490, but the question still stands for any other unprovided colorings)
* @param {bool} confidence enabled? | ||
* @return {array} array of hex's. 1-1 with nodes. | ||
*/ | ||
export const calcBranchStrokeCols = (tree, confidence, colorBy) => { | ||
export const calculateStrokeColors = (tree, branch, confidence, colorBy) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a request for change, just curious
Why combine branch/tip colors into one function with the branch
flag when they are essentially completely different paths within the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While they are ~separate code paths, they both do the same thing: take a colour and modify it according to the node's uncertainty. Co-locating them feels natural to me and should help them to stay in-sync.
I actually thought about taking this further and refactoring it into a function calculateNodeColors -> {branchColors, tipStrokeColors, tipFillColors}
so we calculate everything at once, but I'll leave that for another day (and I need to check that we never have a situation where we only need to recompute one of those sets).
Awesome! This behavior looks spot on to me. Here's the current H5N1 cattle outbreak Note that recent SRA tips are not completely gray. For example, the top clade descends from viruses sampled from South Dakota. These are appropriately colored a gray/green indicating potential South Dakota, but with little certainty. The interpolation between SRA tips close to known Ohio viruses in blue to the the Michigan human case in lime also seems very appropriate. I think Auspice is now doing exactly what it should be doing. However, we still should have a way to have a more data-informed decision about how to set |
This is a really good point @joverlee521. The issue is that currently we use gray to mean either:
This uninteresting take can be seen here https://nextstrain.org/ncov/gisaid/north-america/6m@2020-05-01 for example. This felt semantically appropriate to distinguish focal samples from background samples. I think this is okay however... This example does DTA on samples with a focal vs contextual color ramp so that uncertain nodes and contextual nodes are both gray. This feels okay and appropriate (perhaps not ideal, but not broken). It highlights clades that are more certain to be in a focal region. That said, we should be fixing colorings like the Zika example so that random location is not gray. In the Zika example, "Asia" should be blue, like it is for country. |
The previous code conveyed uncertainty in node attrs for _branches_ by making them appear grey-er, but we never implemented this for _tips_; most likely because we never had a dataset with such data when this was built. Here we use the same approach for tips as for branches, but with a slightly different parameterisation of the interpolation. The mapping of the entropy value into `[0,1]` (`tipOpacityFunction`) was chosen so that tips with no (or very little) uncertainty look unchanged from previous Auspice versions, and uncertainty makes them appear more similar to the branch colour (for an equivalent uncertainty).
f9e654f
to
c2ffa90
Compare
Very difficult at the moment! Let's continue discussion in [maybe] differentiate between nodes with uncertainty vs nodes missing from colour scale |
The previous code conveyed uncertainty in node attrs for branches by making them appear grey-er, but we never implemented this for tips; most likely because we never had a dataset with such data when this was built.
Here we use the same approach for tips as for branches, but with a slightly different parameterisation of the interpolation. The mapping of the entropy value into
[0,1]
(tipOpacityFunction
) was chosen so that tips with no (or very little) uncertainty look unchanged from previous Auspice versions, and uncertainty makes them appear more similar to the branch colour (for an equivalent uncertainty).There should be no visible changes for views without any uncertainty (genotype is a good one to use to test this), as well as traits where there is uncertainty in the dataset but not in the tips (e.g. ebola country / division reconstruction). Here's a side-by-side with the h5n1-cattle-flu dataset from nextstrain/avian-flu#66, which identified this issue in Auspice (this PR LHS, current Auspice RHS):